ANN Building Blocks¶

Biological Neurons¶

Neuron

Algorithm¶

  • Multiple input (on/off):

    • from 1-several neurons
  • Processing:

    • Combination: of inputs
    • Activation: on or off state
  • Single output (on/off): to 1-several neurons

Artificial neurons¶

Algorithm¶

  • Multiple input:

    • from 1-several neurons
  • Processing:

    • Combination: of inputs -- linear model
    • Activation: activation function
  • Single output: to 1-several neurons












Artificial neurons¶

Algorithm¶

  • Multiple input:

    • from 1-several neurons
  • Processing:

    • Combination: of inputs -- linear model
    • Activation: activation function
  • Single output: to 1-several neurons

Weighted linear combination of input:¶

$\begin{eqnarray*} z_j &=& \sum_{i} w_{i,j} a'_{i} + b_j\\ \textrm{weights}&& w_{i,j}\\ \textrm{bias}&& b_j \end{eqnarray*}$



Out[2]:

Artificial neurons¶

Algorithm¶

  • Multiple input:

    • from 1-several neurons
  • Processing:

    • Combination: of inputs -- linear model
    • Activation: activation function
  • Single output: to 1-several neurons

Weighted linear combination of input:¶

$\begin{eqnarray*} z_j &=& \sum_{i} w_{i,j} a_{i} + b_j\\ \textrm{weights}&& w_{i,j}\\ \textrm{bias}&& b_j \end{eqnarray*}$

Sigmoid (logistic) activation function:¶

$a_j = \sigma(z_j)$

Out[3]:

The Sigmoid Neuron¶

Weighted linear combination of input:¶

  • $z_j = \sum_{i} w_{i,j} a'_{i} + b_j$

Sigmoid/logistic activation function¶

  • $a_j=\sigma(z_j) = \frac{1}{1+e^{-z_j}}$



















The Sigmoid Neuron¶

Weighted linear combination of input:¶

  • $z_j = \sum_{i} w_{i,j} a'_{i} + b_j$

Sigmoid/logistic activation function¶

  • $a_j=\sigma(z_j) = \frac{1}{1+e^{-z_j}}$


Compare with logistic GLM¶

  • Weighted linear combination of input:
    • $z = \sum_{i} \beta_{i} x_{i} + \alpha$
  • Sigmoid/logistic link function
    • $Pr[y=1|x] = p = \sigma(z) = \frac{1}{1+e^{-z}}$









Out[5]:

The Sigmoid Neuron¶

Weighted linear combination of input:¶

  • $z_j = \sum_{i} w_{i,j} a'_{i} + b_j$

Sigmoid/logistic activation function¶

  • $a_j=\sigma(z_j) = \frac{1}{1+e^{-z_j}}$


Compare with logistic GLM¶

  • Weighted linear combination of input:
    • $z = \sum_{i} \beta_{i} x_{i} + \alpha$
  • Sigmoid/logistic link function
    • $Pr[y=1|x] = p = \sigma(z) = \frac{1}{1+e^{-z}}$

... or equivalently

$\begin{eqnarray} \sigma^{-1}(p)&=&\log\left(\frac{p}{1-p}\right) =\\ logit(p) &=& \sum_{i}\beta_{i} x_{i} + \alpha \end{eqnarray}$

Out[6]:

Example¶

Let inputs be:
$\begin{eqnarray} a'_1&=&1\\ a'_2&=&0\\ a'_3&=&1 \end{eqnarray}$

and we have
$\begin{eqnarray} z_1 &=& \sum_i w_{i,1}a'_i + b_1\\ a_1 &=& \sigma(z_1) \end{eqnarray}$

$z_1 = 0.3 \times 1 + 0.8 \times 0 + 0.2 \times 1 - 0.5 = ? $

$a_1 = \sigma(z_1) = ?$

Example¶

Let inputs be:
$\begin{eqnarray} a'_1&=&1\\ a'_2&=&0\\ a'_3&=&1 \end{eqnarray}$

and we have
$\begin{eqnarray} z_1 &=& \sum_i w_{i,1}a'_i + b_1\\ a_1 &=& \sigma(z_1) \end{eqnarray}$

Out[8]:

$z_1 = 0.3 \times 1 + 0.8 \times 0 + 0.2 \times 1 - 0.5 = 0$

$a_1 = \sigma(z_1) = \frac{1}{1+e^{-0}} = ?$

Example¶

Let inputs be:
$\begin{eqnarray} a'_1&=&1\\ a'_2&=&0\\ a'_3&=&1 \end{eqnarray}$

and we have
$\begin{eqnarray} z_1 &=& \sum_i w_{i,1}a'_i + b_1\\ a_1 &=& \sigma(z_1) \end{eqnarray}$

Out[9]:

$z_1 = 0.3 \times 1 + 0.8 \times 0 + 0.2 \times 1 - 0.5 = 0$

$a_1 = \sigma(z_1) = \frac{1}{1+e^{-0}} = 0.5$

So, if a sigmoid artificial neuron is just another way of doing logistic regression?¶

... then what's all the fuss about?¶

The fuss happens when you connect several neurons into a network¶

Feed-forward artifical neural networks (ffANN)¶

Layers¶

  • "Columns" of 1-many neurons

  • A single Input layer

    • Input neurons receives data input and passes it to next layer
  • 1-many Hidden layer(s)
    • Articial neurons process their input and deliver output to next layer
  • A single Output layer
    • Artifical neurons process their input and deliver final output $\hat{y}$
      • output $\hat{y}_j = a_j$
      • Continuous $\hat{y}$: Regression
      • Discrete $\hat{y}$: Classification

Connectivity between layers¶

  • ffANN are fully connected
    • each neuron in a layer is connected to each neurons in next layer

ffANN examples¶

Other drawing style, omitting $w$ and $b$. ANN1

ffANN examples¶

Often layers are 'boxed' ANN2

ffANN examples¶

layers w >1 dimension (e.g., images) -- (messy!) Neuron

Simplify! nodes and arrows implicitNeuron

Collect similar layers into 'blocks' Neuron

ffANN examples¶

Also other type of layers/blocks (cf. coming lectures) Neuron

Hidden Layers¶

Inutitive function of hidden layers?¶

  • Each layer can be viewed as transforming the original data to a new multi-dimensional space.
  • A hidden layer should, in practice, have at least two neurons to be meaningful
    • Single neuron layer collapses information and forms a bottleneck
    • A bottleneck early heavily constrains the NN

Depth¶

  • number of hidden layers + output layers

Deep Learning¶

  • Formally, ANNs with depth > 1
    • (often include more advanced layers as well)

Why deep Learning?¶

For Regression¶

  • Single layer $\approx$ logistic regression



Why deep Learning?¶

For Regression¶

  • Single layer $\approx$ logistic regression
  • More layers $\rightarrow$
    • more complex, non-linear, models

Why deep Learning?¶

For Regression¶

  • Single layer $\approx$ logistic regression
  • More layers $\rightarrow$
    • more complex, non-linear, models
Out[20]:

For classification¶

Why deep Learning?¶

For Regression¶

  • Single layer $\approx$ logistic regression
  • More layers $\rightarrow$
    • more complex, non-linear, models
Out[22]:

For classification¶

  • Single layer $\approx$ one hyper-plane

Why deep Learning?¶

For Regression¶

  • Single layer $\approx$ logistic regression
  • More layers $\rightarrow$
    • more complex, non-linear, models
Out[24]:

For classification¶

  • Single layer $\approx$ one hyper-plane
  • Adding layers $\rightarrow$
    • more hyper planes $\rightarrow$
    • more advanced classification

Mini exercise¶

  • http://playground.tensorflow.org/
    • Try different input "problems"
    • Investigate how different depth affect classification
      • number of hidden layers
      • number of neurons in layer
    • Run for several epochs (=iterations)